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SUMMARY  PAGE 


THE  FRDBLEM 

To  develop  a  task  and  stimulus  set  that  can  be  used  for 
studying  aural  classification  of  transient  sonar  signals. 


THE  FINDINGS 

We  extracted  fifty,  cne-second  segments  from  extended 
recordings  of  underwater  acoustic  events.  Using  transcripts  of 
the  recording  sessions  and  the  judgments  of  two  sonar  operators, 
each  of  these  fifty  signals  was  put  into  one  of  eight  categories. 
Two  listeners  were  able  to  categorize  these  fifty  signals, 
presented  individually. 


APPLICATION 

These  preliminary  results  suggest  that  this  set  of  fifty 
signals  and  eight  categories  can  be  used  for  further  testing  of 
aural  classification  ability. 


AOGNISTRAITVE  INFORMATION 

This  research  was  conducted  under  Office  of  Naval  Research 
Work  unit  61153N-KR4209 . 001-CNR4424207 .  It  was  submitted  for 
review  on  6  January  1989,  approved  for  release  on  23  June  1989, 
and  designated  as  NSMRL  Report  No.  1142. 
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ABSTRACT 


We  extracted  fifty,  one-second  segments  frtan  extended 
rocordings  of  underwater  acoustic  events.  Using  transcripts  of 
che  recording  sessions  and  the  judgments  of  two  sonar  operators, 
each  of  these  fifty  signals  was  put  into  one  of  eight  categories. 
Two  additional  listeners  were  tested  on  their  ability  to 
categorize  these  fifty  signals  whan  presented  individually. 
Feedback  was  given  for  three  exenplars  from  each  of  the  eight 
categories.  The  other  twanty-six  signals  were  used  as  probe 
stimuli  to  test  listeners'  abilities  to  generalize  the  category  to 
other  stimuli  fcr  which  they  did  not  receive  feedback.  Listeners 
performed  well  cn  the  task.  They  attained  98.0%  correct  judgments 
on  the  exemplars  and  88.1%  correct  judgments  on  the  probes.  The 
two  listeners  showed  similar  patterns  of  errors.  Finally,  we 
report  anecdotal  evidence  concerning  the  role  of  attenticnal 
processes  in  the  classification  of  these  stimuli.  The  results 
suggest  that  this  stimulus  set  and  classification  task  are. 
appropriate  for  further  testing  to  determine  stimulus  features  and 
attentional  processes  underlying  aural  classification. 
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The  auiitcry  system  obtains  important  information  about  the 
environment  by  classifying  aooustic  events  into  meaningful 
categories.  However,  most  research  on  auditory  processes  bears 
only  indirectly  on  the  ability  to  classify  acoustic  signals. 
Traditionally,  psychoaocustics  has  studied  basic  properties  of 
the  sensory  r^scem  with  simple  tasks,  such  as  detection  or 
discrimination,  and  has  vised  simple  stimuli,  such  as  tones  and 
white  noise.  With  the  exception  of  speech,  classification  studies 
have  tended  to  use  stimulus  sets  synthesized  with  arbitrary 
stimulus  dimensions.  The  present  paper  is  a  preliminary  report  on 
the  development  of  a  classification  task  to  be  used  to  examine 
auditory  features  and  processes  with  realistically  ocnplex 
signals.  We  report  on  listeners'  abilities  to  classify  a  set  of 
stimuli  selected  from  underwater  recordings  of  brief  aooustic 
events.  To  the  extent  that  listeners  can  categorize  these 
stimuli,  there  are  potentially  important  perceptual  distinctions 
being  made  by  the  auditory  system.  Real-world  signals  have  the 
advantage  that  they  possess  the  complexity  of  typical  auditory 
signals  rather  than  the  artificial  structure  of  synthetic  signals. 
Thus,  the  perceptual  distinctions,  or  features,  that  the  auditory 
system  uses  with  these  stimuli  are  potentially  important 
properties  of  other  stimuli  as  well. 


METHOD 

Signals.  Fifty  one-second  segments  were  extracted  from  digitized 
underwater  recordings  of  transient  signals.  The  signal  durations 
ranged  from  tens  to  hundreds  of  milliseconds  and  were 
approximately  centered  within  the  one-seoond  sample.  The 
recordings  had  been  (digitized  at  a  12.5  kHz  sampling  rate  with  12 
bits  of  linear  encoding  of  amplitude.  A  preliminary 
classification  of  the  fifty  events  into  eight  categories  was 
performed  based  on  transcripts  of  the  recording  sessions  and  in 
consultation  with  two  sonar  operators  who  listened  to  the 
recordings  in  their  original  context  (prior  to  extraction) .  Par 
each  of  the  eight  categories,  three  exenplars  were  chosen  that 
represented  good-quality  samples  with  minimal  ambiguity  regarding 
the  accuracy  of  the  classification. 

Apparatus.  Stimuli  were  presented  over  16-bit  digital -to-analog 
converters  and  low-pass  filtered  at  5  kHz.  A  programmable 
attenuator  adjusted  the  amplitude  of  each  signal  to  a  comfortable 
listening  level.  An  electronic  switch  gated  the  stimuli  with 
20-ms,  sine-squared  ramping.  Stimuli  were  presented  to  the  right 
earphone  of  a  Sennheiser  HD430  headset. 

Procedure.  On  each  trial,  one  stimulus  was  presented  and  the 
listener  classified  it  into  one  of  four  or  eight  categories. 
Training  proceeded  in  two  phases.  In  the  first,  only  the 
exemplars  from  categories  one  through  four  were  presented  within 
certain  blocks  and  only  exenplars  from  categories  five  through 


eight  were  presented  witi  i  the  other  blocks.  Feedback  was  given 
on  every  trial.  These  *.  ..joed  sets  were  used  to  facilitate  the 
learning  of  category  labels.  After  several  blocks  of  triads  with 
each  set,  a  second  ptiase  of  training  was  started  where  exemplars 
from  all  eight  categories  were  mixed  within  a  block  of  72  trials. 
Feedback  was  again  given  on  every  trial.  After  listeners  attained 
essentially  perfect  performance  on  this  task,  the  test  phase  of 
the  experiment  began.  Each  of  the  fifty  stimuli  was  presented 
once  within  a  block  of  fifty  trials.  If  an  exemplar  was 
presented,  feedback  was  given.  However,  if  the  stimulus  was  one 
of  the  twenty-six  stimuli  that  was  not  an  exemplar  (such  stimuli 
will  be  called  probe  stimuli) ,  no  feedback  was  given.  Twenty-four 
blocks  of  trials  were  collected  during  the  test  phase.  These 
blocks  ware  run  over  three  days.  One  practice  block  with  only  the 
oxemplar  signals  was  run  at  the  beginning  of  each  day. 

Listeners.  Two  laboratory  personnel,  one  of  whxn  was  the  author, 
served  as  listeners.  Each  had  normal  hearing  sensitivity  (less 
than  15  dB  HL  at  octave  frequencies  frcm  250  to  8000  Hz) .  Both 
had  been  involved  in  the  stimulus  ^reparation  and  were  very 
familiar  with  the  signals  prior  to  testing. 


RESULTS 

After  fewer  than  500  practice  trials,  both  listeners  were  at 
essentially  100%  oorrect  classification  of  the  24  exemplars  into 
the  eight  categories.  Naive  listeners  would  presumably  require 
more  training  time.  Table  I  shows  the  confusion  matrix  for  the  24 
exemplars  frcm  the  practice  blocks  that  contained  all  eight 
categories  (including  the  practice  block  collected  prior  to  the 
test  blocks  on  each  day) .  The  first  listener'  made  too  few  errors 
to  show  any  consistent  pattern  of  confusions.  The  second 
listener's  data  show  a  more  definite  pattern.  Her  confusions  tend 
towards  two  groupings.  Confusions  exist  among  the  stimulus 
categories  1,  2,  3,  and  7  as  one  grouping  and  among  categories  4, 
6,  and  8  as  a  second  grouping.  In  fact,  confusions  for  tne  first 
listener  are  generally  consistent  with  this  pattern.  This 
consistency  suggests  that  the  stimuli  cure  perceived  and 
categorized  similarly  for  the  two  listeners  albeit  more  reliably 
by  the  first  listener. 
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Table  I 


Confusion  matrix  for  the  practice  blocks.  Each  entry 
represents  the  number  of  times  a  listener  gave  a  particular 
response  to  a  particular  stimulus. 


Listener  1: 


Stimulus 

1 

2 

3 

4 

5 

6 

7 

8 


1 

72 


1 


Response 

2  3  4  5  6  7 


8 


Listener  2: 


12  3 

Stimulus 

1  69  -  3 

2  1  64  4 

3  1  7  64 

4 

5 

6 

7  -  2 

8 


Response 
4  5  6 


70  -  2 

72  - 

68 

4 


7  8 


3 


70 


4 

68 


3 


Table  II  shows  the  confusion  matrix  for  the  24  exemplars  from 
the  test  phase.  Of  course,  both  listeners  had  fewer  confusions 
than  during  the  practice  sessions,  but  those  that  were  made  are 
generally  consistent  with  the  pattern  from  the  practiae  sessions. 
The  second  listener's  confusions  are  among  categories  1,  2,  3,  and 
7  or  among  4,  6,  and  8.  The  first  listener  had  only  five 
confusions  out  of  almost  600  trials,  too  few  to  evaluate  a 
pattern. 


Table  n 


Confusion  matrix  for  the  exemplars  from  the  test  blocks. 
Each  entry  represents  the  number  of  times  a  listener  gave  a 
particular  response  to  a  particular  stimulus. 


Listener  1: 


Stimulus 

1 

2 

3 

4 

5 

6 

7 

8 


Response 

1  2  3  4  5  6  7  8 


701---1- 
171-  —  -  —  - 

--71--1- 
---72--- 
—  -  —  —  711- 

-----72- 
------  72 


Listener  2: 

Response 

12345678 

Stimulus 

1  72 

2  2 

3 

4 

5 

6 

7 

8 


69-  —  —  -1- 
4  68  ----- 

--72---- 
---72--- 

-  —  -  —  72-- 
-----72- 

-  -  2  -  9  -  61 


4 


Listeners  were  also  able  tc  consistently  classify  the  probe 
stimuli  although  not  with  the  same  reliability  as  for  the 
exenplars.  For  twenty-three  cut  of  twenty -six  stimuli,  both 
listeners  had  the  same  modal  response,  i.e. ,  the  meet  common 
response  was  the  same.  For  fifteen  of  the  stimuli,  the 
response  represents  more  tiian  95%  of  the  responses.  For  19  of  the 
stimuli,  the  modal  response  represents  at  least  85%  of  the 
responses.  Five  of  the  seven  stimuli  that  had  fewer  than  85%  of 
their  responses  in  a  single  category  were  poor  stimuli  in  that 
either  1)  an  extraneous  event  was  included  in  the  1-sec  sample,  2) 
the  preliminary  classification  of  the  event  was  ambiguous,  or  3) 
the  original  recording  was  muffled.  Nonetheless,  the  modal 
response  agreed  with  the  preliminary  classification  of  each  probe 
except  for  those  three  stimuli  where  the  modal  reqpcnse  was 
different  for  the  two  listeners,  in  which  case  one  listener's 
modal  response  differed  from  the  preliminary  classification. 


Table  Ilia 


Confusion  matrix  for  the  probe  stimuli  from  the  test  blocks. 
Each  entry  represents  the  number  of  times  a  listener  gave  a 
particular  response  to  a  particular  stimulus. 


Listener  1: 


Stimulus 

1 

2 

3 

4 

5 

6 

7 

8 


12  3 

48 

40 
1 


Response 
4  5 


7 

118  - 
98 


6  7  8 


1 

1 

6-16 


119  -  25 

48  - 

-  -  96 


Listener  2: 

1 

2 

3 

Response 

4  5 

6 

7 

8 

Stimulus 

1 

47 

1 

2 

- 

40 

8 

- 

- 

- 

- 

- 

3 

- 

6 

96 

- 

- 

17 

- 

1 

4 

- 

- 

- 

104 

- 

14 

- 

C. 

5 

- 

- 

- 

- 

- 

- 

- 

- 

6 

- 

- 

- 

15 

- 

120 

- 

9 

7 

- 

3 

- 

6 

- 

- 

39 

- 

8 

- 

- 

- 

- 

- 

10 

- 

86 
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Table  Ilia  shows  the  confusion  matrix  for  each  listener  for 
the  probe  stimuli.  Once  again,  the  first  listener  classified  the 
stimuli  more  reliably  than  the  second.  Although  many  of  the 
confusions  are  consistent  with  those  made  with  the  exemplars,  many 
are  not.  Most  of  these  new  confusions  are  due  to  the  5  stimuli 
that  might  be  considered  poor  stimuli  based  on  the  criteria 
discussed  in  the  previous  paragraph.  Table  mb  shows  the 
confusion  matrix  when  these  five  stimuli  are  excluded  from  the 
analysis.  With  one  exception,  the  most  cannon  confusions  are 
between  categories  1,  2,  3,  and  7  as  one  grouping  and  among 
categories  4,  6,  and  8  as  a  separate  grouping.  These  confusions 
are  the  same  as  those  found  with  the  exemplars.  The  exception  is 
that  each  listener  once  responded  category  8  to  a  category  3 
stimulus.  Many  of  the  confusions  are  a  response  of  category  6  or 
8  to  a  category  4  stimulus.  In  sumtary,  classification  of  the 
probes  is  quite  good,  with  minor  exceptions  probably  due  to  the 
quality  of  original  signals.  Listeners'  classifications  agree 
with  those  based  on  prior  listening  in  a  fuller  context  vising 
recording  transcripts.  Furthermore,  the  pattern  of  confusions  is 
similar  for  exemplars  and  probes. 

Table  Illb 

Confusion  matrix  for  the  probe  stimuli  from  the  test  blocks. 
Five  stimuli  have  been  excluded  from  the  analysis.  Each  entry 
represents  the  number  of  times  a  listener  gave  a  particular 
response  to  a  particular  stimulus. 


Listener  2:  Response 

12345678 

Stimulus 

147  1-  --  --  - 

2  -  20  4  -  --  -- 

3--71----1 
4  --*-104-  14-  2 

5-------- 

6  -  -  -2  -109-9 

7  -  1  -  -  -  -  23  - 

8----  -  10  -86 
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DISCUSSION 


The  results  are  encouraging  in  that  the  task  and  stimuli  seen 
appropriate  for  further  testing  of  aural  classification  ability. 
Listeners  learned  to  categorize  stimuli  when  feedback  was  given, 
and  this  learning  generalized  to  probe  stimuli  for  which  no 
feedback  was  given  (in  the  experimental  setting) .  Although  seme 
skepticism  is  warranted  because  of  the  listeners'  familiarity  with 
the  signals  prior  to  cesting,  it  should  be  noted  that  the 
listeners  were  generally  unaware  of  whether  a  particular  stimulus 
was  an  exenplar  or  probe  until  after  feedback  was  (or  was  not) 
given.  Thus,  it  is  unlikely  that  the  listeners  used  a  separate 
strategy  for  the  probe  stimuli  based  on  remembered 
classifications  of  than.  It  is  still  possible  however,  that,  due 
to  their  prior  experience,  the  listeners  learned  more  general 
categories  than  would  be  learned  by  a  naive  listener  who  would 
only  learn  the  categories  from  the  exemplars,  it  is  certainly  to 
be  expected  that  naive  listeners  will  require  much  more  training 
than  required  for  the  listeners  in  this  experiment.  It  is  also 
possible  that  learning  may  be  facilitated  by  grouping  the 
ccnfusable  stimuli  during  the  initial  training  phase.  That  is, 
stimuli  l,  2,  3,  and  7  could  be  used  as  one  set  and  4,  5,  6,  and  8 
in  another  rather  than  1-4  and  5-8  as  was  done  in  this 
experiment. 

Severed  other  observations  are  noteworthy  concerning 
attent  icnal  processes  that  facilitate  classification  of  these 
signal  events.  First,  the  one-seoond  stimulus  duration  was  chosen 
as  a  minimal  duration  which  allowed  a  clear  perception  of  the 
event.  With  briefer  durations  the  stimuli  were  more  difficult  to 
hear,  possibly  because  of  the  proximity  of  the  onset  or  offset  to 
the  event.  For  sane  stimuli,  the  percept  was  noticeably  less 
clear  when  the  onset  was  less  than  300  msec  from  the  event. 

Second,  the  signal  events  were  often  much  clearer  in  the  1-sec 
presentation  than  when  presented  in  the  full  context  of  an  ongoing 
stream  of  information.  This  observation  held  even  when  the  full 
context  of  8-25  sec  was  played  repeatedly  and  the  listener  could 
attend  to  the  specific  event.  Vhether  this  effect  is  due  to 
temporal  uncertainty  or  the  cccpeting  presence  of  other  perceptual 
information  is  unclear.  This  effect  presumably  reflects  the 
processing  time  required  for  classification  in  combination  with 
memory  limitations  on  the  amount  of  sensory  information  that  can 
be  stored  in  prep  recessed/ precategorized  form.  Finally,  a  form  of 
automat icity  develops  for  the  categorization  of  these  events. 
Classification  becomes  less  effortful  and  less  conscious.  In 
fact,  after  repeated  exposure  to  these  sounds,  nonexperimental 
sounds  occurring  in  the  normal  environment  seemed  to  grab  the 
listener's  attention  rather  than  being  perceived  incidentally. 
Although  anecdotal,  these  observations  bear  cr>  the  complexity  of 
the  classification  process  and  attentions!  mechanists  that 
determine  the  salience  of  particular  information. 
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CONCLUSIONS 


Listeners  could  accurately  classify  a  set  of  50  brief  sonar 
signals.  Even  for  stimuli  where  no  feedback  was  given,  accuracy 
was  generally  good  and  the  errors  that  were  node  were  similar  to 
those  made  an  stimuli  with  feedback.  The  results  when  nc  feedback 
was  given  suggest  that  listeners  learn  a  perceptual  category 
rather  than  merely  developing  the  ability  to  assign  responses  to 
individual  stimuli.  Thus,  these  preliminary  results  are 
encouraging  in  indicating  that  this  stimulus  set  and 
classification  task  can  be  used  to  study  aural  classification  with 
real  acoustic  signals.  Future  experiments  can  nanipulate  these 
signals  to  identify  the  importance  of  particular  stimulus 
information  and  can  modify  the  task  to  examine  attentions! 
factors. 

The  present  results  will  be  incorporated  into  a  technical 
report  evaluating  the  importance  of  envelope  cues  for  aural 
classification. 
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